accuracy measure

Terms from Artificial Intelligence: humans at the heart of algorithms

Page numbers are for draft copy at present; they will be replaced with correct numbers when final book is formatted. Chapter numbers are correct and will not change now.

Although we use the term accuracy, in day-to-day speech there are many different kinds of accuracy measures depending on the kind of data and application. Often the different measures are contradictory, getting the best accuracy on one metric means sacrificing accuracy on another, including the precision–recall trade-off.

For numeric data the most common measure is root mean square (RMS) in part because it has nice statistical properties, for example linear regression is about finding the line through data that minimises RMS. RMS is affected particularly strongly by small numbers of extreme values, so average absolute difference may be used instead. If we are interested in worst case scenarios, the maximum difference may be more useful.

For classifications, even binary chocies, the situation is yet more complex. Binary choices have two main kinds of errors false positives when we assign something to a class (say a disease diagnosis), but it is actually not in the class and false negatives when we fail to recognise a true diagnosis. If the probabilty of a false positive is low we have high precision, and if the probability of a false negative is low we have high recall – whuch we want depends on the relative costs of the different kinds of error. These are sometimes combined into a single measure, most commonly the F-score. If we have evidence (say a confidence measure from a machine learning algorithm) and use a threshold to deterimine our decsions, then increasing the threshold means we may have more false negatives whereas reducing it means we have more false posituves. The ROC curve visualises this trade-off.

Used in Chap. 9: pages 130, 139

Also known as accuracy metrics

ROC curve – trade-off between false positive and false negative rates